Data Analyzer using Unity Catalog

In the data analyzer stage, you perform analysis of the complete dataset based on selected constraints. For this you must add the data analyzer node to the data quality stage and then create a data analyzer job.

Prerequisites

You must complete the following prerequisites before creating a data analyzer job:

  • The data quality nodes have specific requirements as far as the Databricks Runtime version of the cluster and access mode is concerned. Following are the requirements for Unity Catalog-enabled Databricks used as a data analyzer node in the data pipeline:

    Data Quality Node Databricks Cluster Runtime Version Access Mode
    Data Analyzer 12.2 LTS Dedicated
  • Access to a Databricks Unity Catalog node which will be used as a data lake in the data ingestion pipeline.

Creating a data analyzer job

  1. On the home page of Data Pipeline Studio, click the data analyzer node.

  2. Click the issue resolver node and click Create Job.

    Create Unity Catalog Data Analyzer Job

  3. On the Unity Catalog Analyzer Job tab, click Create Job.
  4. Complete the following steps to create the job:

    Running the data analyzer job

    You can run the data analyzer job in multiple ways:

    • Publish the pipeline and click Run Pipeline.

    • Click the data analyzer node and click Start to initiate the Unity Catalog Data Analyzer Job run.

    Viewing the results of the data anazlyer job

  5. After the job run is successful, click the Unity Catalog Analyzer Result tab.

  6. Click View Analyzer Results.

  7. On the Output of Analyzer Runner screen, the SQL warehouse associated with this Unity Catalog instance is preselected. Select a job run from the Run Details dropdown list.

  8. You can view the results of the selected data analyzer job run. Click to download the results in a CSV file.

Related Topics Link IconRecommended Topics What's next? Data Issue Resolver using Unity Catalog